colab notebook
Machine learning needs better tools - Replicate – Replicate
Machine learning used to be an academic pursuit. If you wanted to work on it, you probably needed to be part of a lab or have a PhD. In early 2021, there was a shift. RiversHaveWings followed up with the VQGAN CLIP notebook. These notebooks turned text descriptions into images by guiding a GAN with CLIP.
AutoML
This may revolutionize data science: we introduce TabPFN, a new tabular data classification method that takes 1 second & yields SOTA performance (competitive with the best AutoML pipelines in an hour). So far, it is limited in scale, though: it can only tackle problems up to 1000 training examples, 100 features and 10 classes. TabPFN is radically different from previous ML methods. It is a meta-learned algorithm and it provably approximates Bayesian inference with a prior for principles of causality and simplicity. TabPFN happens to be a single transformer, but this is not the usual "trees vs nets" b a t t l e.
Train a Custom Object Detector with Detectron2 and FiftyOne
Combine the dataset curation of FiftyOne with the model training of Detectron2 to easily train custom detection modelsImage 71df582bfb39b541 from the Open Images V6 dataset (CC-BY 2.0) visualized in FiftyOneIn recent years, every aspect of the Machine Learning (ML) lifecycle has had tooling developed to make it easier to bring a custom model from an idea to a reality. The most exciting part is that the community has a propensity for open-source tools, like Pytorch and Tensorflow, allowing the model development process to be more transparent and replicable.In this post, we take a look at how to integrate two open-source tools tackling different parts of an ML project: FiftyOne and Detectron2. Detectron2 is a library developed by Facebook AI Research designed to allow you to easily train state-of-the-art detection and segmentation algorithms on your own data. FiftyOne is a toolkit designed to let you easily visualize your data, curate high-quality datasets, and analyze your model results.Together, you can use FiftyOne to curate your custom dataset, use Detectron2 to train a model on your FiftyOne dataset, then evaluate the Detectron2 model results back in FiftyOne to learn how to improve your dataset, continuing the cycle until you have a high-performing model. This post closely follows the official Detectron2 tutorial, augmenting it to show how to work with FiftyOne datasets and evaluations.Follow along in Colab!Check out this notebook to follow along with this post right in your browser.Screenshot of Colab notebook (image by author)SetupTo start, we’ll need to install FiftyOne and Detectron2.# Install FiftyOnepip install fiftyone # Install Detectron2 from Source (Other options available)python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'# (add --user if you don't have permission)# Or, to install it from a local clone:git clone https://github.com/facebookresearch/detectron2.gitpython -m pip install -e detectron2# On macOS, you may need to prepend the above commands with a few environment variables:CC=clang CXX=clang++ ARCHFLAGS="-arch x86_64" python -m pip install ...Now let’s import FiftyOne and Detectron2 in Python.https://medium.com/media/aeed86d37435228fabf6d9c9ba9de189/hrefPrepare the DatasetIn this post, we show how to use a custom FiftyOne Dataset to train a Detectron2 model. We’ll train a license plate segmentation model from an existing model pre-trained on the COCO dataset, available in Detectron2’s model zoo.Since the COCO dataset doesn’t have a “Vehicle registration plate” category, we will be using segmentations of license plates from the Open Images v6 dataset in the FiftyOne Dataset Zoo to train the model to recognize this new category.Note: Images in the Open Images v6 dataset are under the CC-BY 2.0 license.For this example, we will just use some of the samples from the official “validation” split of the dataset. To improve model performance, we could always add in more data from the official “train” split as well but that will take longer to train so we’ll just stick to the “validation” split for this walkthrough.https://medium.com/media/199e938638b63c513645062845d0a30c/hrefSpecifying a classes when downloading a dataset from the zoo will ensure that only samples with one of the given classes will be present. However, these samples may still contain other labels, so we can use the powerful filtering capability of FiftyOne to easily keep only the “Vehicle registration plate” labels. We will also untag these samples as “validation” and create our own splits out of them.https://medium.com/media/752bb3531d42324afb97a185630c61a2/hrefhttps://medium.com/media/637aec3dc2829cfc944ddeba3235408f/hrefNext, we need to parse the dataset from FiftyOne’s format to Detectron2's format so that we can register it in the relevant Detectron2 catalogs for training. This is the most important code snippet to integrate FiftyOne and Detectron2.Note: In this example, we are specifically parsing the segmentations into bounding boxes and polylines. This function may require tweaks depending on the model being trained and the data it expects.https://medium.com/media/dab5dc327d07f670d088852b01d8cd08/hrefLet’s visualize some of the samples to make sure everything is being loaded properly:https://medium.com/media/f482d61d21f5dfe480845e047745fb31/hrefVisualizing Open Images V6 training dataset in FiftyOne (Image by author)Load the Model and Train!Following the official Detectron2 tutorial, we now fine-tune a COCO-pretrained R50-FPN Mask R-CNN model on the FiftyOne dataset. This will take a couple of minutes to run if using the linked Colab notebook.https://medium.com/media/a6294adcd080b451d88f5fc75646cda5/href# Look at training curves in tensorboard:tensorboard --logdir outputTensorboard training metrics visualization (Image by author)Inference & evaluation using the trained modelNow that the model is trained, we can run it on the validation split of our dataset and see how it performs! To start,
How to get started with Coqui's open source on-device speech to text tool
I think the transformative power of on-device speech to text is criminally under-rated (and I'm not alone), so I'm a massive fan of the work Coqui are doing to make the technology more widely accessible. Coqui is a startup working on a complete open source solution to speech recognition, as well as text to speech, and I've been lucky enough to collaborate with their team on datasets like Multilingual Spoken Words. They have have great documentation already, but over the holidays I've been playing around with the code and I always like to leave a trail of breadcrumbs if I can, so in this post I'll try to show you how to get speech recognition running locally yourself in just a few minutes. I've tried it on my PopOS 21.04 laptop, but it will hopefully work on most modern Linux distributions, and should be trivial to modify for other platforms that Coqui provide binaries for. To accompany this post, I've also published a Colab notebook, which you can use from your browser on almost any system, and demonstrates all these steps.
- Information Technology > Software (1.00)
- Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Use Google Colab Like A Pro
Originally published on Towards AI the World's Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses. Regardless of whether you're a Free, Pro, or Pro user, we all love Colab for the resources and ease of sharing it makes available to all of us.
- Media (0.55)
- Information Technology > Services (0.30)
Python Practice Problems for Beginner Coders - I School Online
From sifting through Twitter data to making your own Minecraft modifications, Python is one of the most versatile programming languages at a coder's disposal. The open-source, object-oriented language is also quickly becoming one of the most-used languages in data science. According to the Association for Computing Machinery, Python is now the most popular introductory language at universities in the United States. To help readers practice the Python fundamentals, datascience@berkeley gathered six coding problems, including some from the W200: Introduction to Data Science Programming course. Consider the following questions to make sure you have the proper prior knowledge and coding environment to continue.
- North America > United States > Utah > Salt Lake County > Salt Lake City (0.05)
- North America > United States > New York (0.05)
- North America > United States > Maryland > Anne Arundel County > Annapolis (0.05)
- (2 more...)
Running Redis on Google Colab - KDnuggets
Google Colab is a popular browser based environment for executing Python code on hosted Jupyter notebooks and training models for machine learning, including free access to GPUs! It is a great platform for data scientists and machine learning (ML) engineers for learning and quickly developing ML models in Python. Redis is an in-memory open source database that is increasingly being used in machine learning - from caching, messaging and fast data ingest, to semantic search and online feature stores. In fact, NoSQL databases - and specifically Redis - was named by Ben Weber, Director of Applied Data Science at Zynga as one of the 8 new tools he learned as a data scientist in 2020. Because of the increasing use of Redis for data science and machine learning, it is very handy to be able to run Redis directly from your Google Colab notebook!
StyleCLIPDraw: Text-to-Drawing Synthesis with Artistic Control
I explain Artificial Intelligence terms and news to non-experts. Have you ever dreamed of taking the style of a picture, like this cool TikTok drawing style on the left, and applying it to a new picture of your choice? Well, I did, and it has never been easier to do. In fact, you can even achieve that from only text and can try it right now with this new method and their Google Colab notebook available for everyone (see references). Simply take a picture of the style you want to copy, enter the text you want to generate, and this algorithm will generate a new picture out of it!
Chapter 3 : Transfer Learning with ResNet50 -- from Dataloaders to Training
I was given Xray baggage scan images by an airport to develop a model that performs automatic detection of dangerous objects (gun and knife). Given only a small amount of Xray images, I am using Domain Adaptation by first collecting a large number of normal (non-Xray) images of dangerous objects from the internet, training a model using only those normal images, then adapting the model to perform well on Xray images. In my previous post, I talked about iterative data collection process for web images of gun and knife to be used for domain adaptation. In this post, I will discuss transfer learning with ResNet50 using the scraped web images. For now, we won't worry about the Xray images and only focus on training the model with the web images. To read this post, it's recommended to have some knowledge about how to apply transfer learning using a model pre-trained on ImageNet in PyTorch. I won't explain every step in detail, but will share some useful tips that can answer questions like: